Anchor points for genome alignment based on Filtered Spaced Word Matches
نویسندگان
چکیده
Alignment of large genomic sequences is a fundamental task in computational genome analysis. Most methods for genomic alignment use high-scoring local alignments as anchor points to reduce the search space of the alignment procedure. Speed and quality of these methods therefore depend on the underlying anchor points. Herein, we propose to use Filtered Spaced Word Matches to calculate anchor points for genome alignment. To evaluate this approach, we used these anchor points in the the widely used alignment pipeline Mugsy. For distantly related sequence sets, we could substantially improve the quality of alignments produced by Mugsy.
منابع مشابه
Fast and accurate phylogeny reconstruction using filtered spaced-word matches
Motivation Word-based or 'alignment-free' algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods. Results We propose Filtered Spaced Word Matches (FSWM) , a fast alignment-free...
متن کاملFast alignment-free sequence comparison using spaced-word frequencies
MOTIVATION Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free ...
متن کاملA Novel Pseudo-Alignment Approach to Fast Genomic Sequence Comparison
Standard methods for sequence analysis and phylogeny reconstruction are based on (multiple) sequence alignments. These methods are known to be accurate but if larger genomic sequences are to be analysed they reach their limits. Consequently, faster but less precise alignment-free methods are increasingly used for genomic sequence analysis. In this work, a novel approach to fast genomic sequence...
متن کاملImproved sensitivity and reliability of anchor based genome alignment
Whole genome alignment is a challenging problem in computational comparative genomics. It is essential for the functional annotation of genomes, the understanding of their evolution, and for phylogenomics. Many global alignment programs are heuristic variations on the anchor based strategy, which relies on the initial detection of similarities and their selection in an ordered chain. Considerin...
متن کاملSpaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017